Protein Science
○ Wiley
Preprints posted in the last 30 days, ranked by how well they match Protein Science's content profile, based on 221 papers previously published here. The average preprint has a 0.07% match score for this journal, so anything above that is already an above-average fit.
Richards, D. M.; zhai, F.; Li, S.; Yu, Q.
Show abstract
Thermal proteome profiling (TPP) and its higher-throughput derivative, the proteome integral solubility alteration (PISA) assay, measure changes in protein thermal stability upon ligand binding or other perturbations and have been widely adopted in drug discovery and biomedical research. Though the PISA workflow is straightforward, key parameters, including detergent concentration, methods for removing denatured aggregates, and temperature range selection, vary across studies and can markedly influence assay outcomes. Yet these factors have not been systematically evaluated, limiting rational experimental design and data interpretation. Here, through a combined use of TPP, PISA, tandem mass tag (TMT)-based multiplexing, and computational simulation, we systematically characterize these parameters based on the melting behavior of [~]9,000 proteins. We find that reducing detergent concentration elevates apparent Tm by 1.5-2{degrees}C proteome-wide, and aggregate removal by filtration versus centrifugation further alters measurements. We leverage these observations to optimize PISA then apply the optimized conditions to identify the aminopeptidase NPEPPS as a previously uncharacterized binding partner of angiotensin II, a key vasoactive peptide hormone in blood pressure regulation. Together, this work provides a general framework for assay design and data interpretation, and extends the utility of PISA beyond small molecules to dissecting peptide-protein interactions, an increasingly important modality in drug discovery.
Cho, Y.; Tsuboyama, K.; Litberg, T. J.; Jung, M. D.; Obisesan, A.; Wang, Q.; Phoumyvong, C. M.; Thibeault, J.; Ovchinnikov, S.; Rocklin, G. J.
Show abstract
Predicting absolute protein folding stability is a long-standing challenge in biophysics, with broad applications in protein design and in understanding genetic variation and evolution. Physics-based simulations have shown limited success at predicting stability and are often computationally intractable, and machine learning methods have been constrained by the lack of sufficiently large experimental datasets. We recently introduced cDNA display proteolysis, a cell-free approach that can measure folding stability for nearly one million protein domains in parallel. Here, we applied this method to measure stability for 1.8 million diverse protein domains 60-80 amino acids in length primarily taken from the MGnify metagenomic database and spanning over 200,000 sequence families. Using this new "MGnify Stability dataset", we developed the predictive models SaProt{Delta}G and ESM3{Delta}G, which accurately predict absolute folding stability for small domains with root mean squared error of 0.8 kcal/mol over a 6 kcal/mol range (Spearman rank correlation of 0.88). These predictors show high accuracy at predicting effects of substitutions, insertions, and deletions, successfully identify global trends toward higher stability in thermophilic organisms, and improve discrimination of stable and unstable computationally designed proteins. Our results illustrate how megascale biophysical measurements can complement existing evolutionary and structural data to enable accurate absolute stability prediction for small domains.
Zhang, S.; Maddipatla, S. A.; Vedula, S.; Marx, A.; Bronstein, A. M.
Show abstract
{beta}-turns are among the most common structural motifs in proteins, yet their conformational dynamics and sequence determinants remain incompletely understood. Here we present a data-driven classification and dynamic analysis of {beta}-turn conformations using large-scale molecular dynamics trajectories from the mdCATH database. Clustering of backbone dihedral angles using a cross-bond Ramachandran representation identifies six {beta}-turn types, including a previously uncharacterized hybrid I/I' cluster that combines geometric features of canonical type I and I' conformations. Time-resolved analysis indicates that this hybrid state acts as a transient intermediate state of {beta}-turns. Transitions observed in molecular dynamics simulations closely match NMR ensembles and altlocs detected in X-ray crystal structures, with the most dominant exchanges occurring between type I and II, and between type I' and II' turns. Sequence analysis shows that each turn type exhibits characteristic amino acid preferences at the central residues (i + 1 and i + 2). Within these overall preferences, specific residue pairs display distinct biases toward static or dynamic behavior. Targeted in silico substitutions that interchange dynamic- and static-enriched residue pairs shift the conformational behavior of turns accordingly, providing direct support for these sequence-dynamics relationships. Analysis of flanking secondary-structure environments reveals that structural context further modulates turn flexibility, with strand- and coil-associated turns exhibiting higher dynamic propensity than helix-associated turns. Together, these results reveal how sequence composition and structural context jointly shape the conformational landscape of {beta}-turns.
Peteani, G.; Sgueglia, G.; Lemmin, T.; Chino, M.
Show abstract
MotivationProtein language models (pLMs) capture evolutionary sequence constraints but are limited in modeling underrepresented functional classes due to training data imbalance. Metalloproteins constitute a fundamental but sparsely represented class in sequence databases. We therefore assess whether structure-conditioned synthetic sequences can be used to specialize pLMs toward metal-binding functionality. ResultsWe fine-tuned the generalist model ProtGPT2 on synthetic sequences generated by the inverse-folding model ProteinMPNN, constructing training sets with controlled variation in size and diversity. Fine-tuning increased recovery of canonical metal-binding motifs from 43% in the baseline model to 91% in the fine-tuned models. Generated sequences retained high predicted structural confidence and structural similarity to known folds, despite low sequence identity. Analysis of latent representations from ProtGPT2 indicated that fine-tuned models occupy distinct regions of embedding space relative to both the baseline model and structure-conditioned sequences, consistent with partial incorporation of structural constraints while preserving sequence diversity. A multi-step filtering pipeline applied to sequences lacking canonical motifs identified candidate metal-binding sites in four-helical bundle topologies not detected in a non-redundant subset of Protein Data Bank structures or in AlphaFold-predicted proteomes. Availability and implementationCode, trained models, and datasets are available at: https://doi.org/10.5281/zenodo.18672158 and https://huggingface.co/gsgueglia.
Fonda, B. D.; Murray, D. T.
Show abstract
The Tar-DNA Binding Protein-43 C-terminal region, TDP43LC, has been previously shown to form amyloid-like fibrils with distinct folds in ALS and FTD. In both diseases, proteinaceous inclusions contain TDP43 C-terminal protein fragments as well as phosphorylated TDP43. Here, we use solution NMR to show that soluble phosphomimetic TDP43LC, P-TDP43LC, is structurally similar to wild-type TDP43LC. Disperse P-TDP43LC, like wild-type protein, contains a central helical region flanked by long disordered regions. Despite this similarity, our turbidity measurements, imaging, and kinetic assays show that P-TDP43LC has different aggregation behavior than wild-type protein. Using solid state NMR measurements we find that that phosphomimetic mutations alter the wild-type fibril conformation. Electrostatic repulsion from negatively charged sidechains, despite having little effect on the soluble proteins structure, perturbs amyloid-like fibril formation and selects for a different conformation in vitro. These results shed light on the structural role of TDP43LC phosphorylation in fibril formation in disease. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/725298v1_ufig1.gif" ALT="Figure 1"> View larger version (16K): org.highwire.dtl.DTLVardef@1c63aforg.highwire.dtl.DTLVardef@1d48ed6org.highwire.dtl.DTLVardef@1ed8fd3org.highwire.dtl.DTLVardef@17d67a8_HPS_FORMAT_FIGEXP M_FIG C_FIG SynopsisPhosphomimetic mutations at ALS and FTD neurodegeneration-associated sites in an amyloid forming protein perturbs the aggregated structure compared to wild-type protein.
Zafiropoulo, H. R.; Thomas, J. E.; Cortez, N. R.; Apostol, K.; de Sa, A.; Khosravi, R.; Moore, L.; Berndsen, C. E.; Bibel, B.
Show abstract
Species of Bacillus bacteria including Bacillus safensis and Bacillus subtilis are finding increasing uses in biotechnology and bioremediation, thanks in part to their metabolic robustness. Malate dehydrogenase (MDH) is at the heart of central metabolism and thus a better understanding of Bacillus MDH proteins could aid in the optimization of these applications. MDH of Bacillus spp. belong to the lactate dehydrogenase (LDH)-like class of MDHs, otherwise known as the MDH3 class. Despite wide prevalence in nature among prokaryotes and archaea, this typically homotetrameric class is understudied compared to the MDH1 and MDH2 classes found in eukaryotes. We therefore recombinantly expressed and purified MDH proteins from two societally relevant Bacillus spp.-B. safensis and B. subtilis-and characterized them biophysically (via Size Exclusion Chromatography-Small Angle X-ray Scattering (SEC-SAXS) and Differential Scanning Fluorimetry (DSF)) and enzymatically (via spectroscopic activity assays). As expected based on their high sequence identity, the two MDH orthologs had similar properties in most regards, including a tetrameric structure and high susceptibility to substrate inhibition. However, we uncovered differences in conditional thermal stability, in addition to subtle differences in enzymatic activity that offer insight into the workings of LDH-like MDH. Summary statementMalate dehydrogenase (MDH) is a fundamental metabolic enzyme, from microbes to mammals, yet comparably little is known about microbial MDH, especially MDH of the tetrameric MDH3 class. We compare the biophysical and enzymatic properties of two such enzymes from the societally relevant bacterial species Bacillus subtilis and Bacillus safensis, offering useful insight with potential biotechnological implications.
Osumi, K. M.; Murray, D. T.
Show abstract
GFAP is a type III intermediate filament primarily found within astrocytes and is known to maintain proper cell structure and mechanical strength. Mutations in GFAP are implicated in the pathology of Alexander disease, a neurodegenerative disease characterized by cytoplasmic inclusions of protein, known as Rosenthal fibers. GFAP has a typical type III intermediate filament domain structure, consisting of a highly conserved alpha-helical rod domain bracketed by an intrinsically disordered N-terminal head and C-terminal tail domains. While the general domain organization of monomeric GFAP and the assembly process for higher order quaternary structures are known, we lack an atomic resolution mechanistic understanding of GFAP assembly into mature filaments. Understanding the structure of GFAP filaments and how mutations disrupt this structure will provide vital information into how mutations produce Alexander disease pathology. As a first step towards a mechanistic description, we characterized GFAP wild type tetrameric and filamentous assemblies using solid state NMR and compared the results to those obtained from an assembly-deficient GFAP mutant. For wild-type GFAP, we observe surprisingly uniform rigid alpha helical structure and can spectroscopically resolve highly mobile intrinsically disordered regions in the filament assemblies. Wild type tetramers show increased mobility, likely arising from the head and tail domains. Mutation of the highly conserved cysteine at position 294 to serine results in an inability to form full-length filament assemblies. We show that the rigid regions of the C294S mutant assemblies largely remain structurally consistent with wild type tetrameric assemblies but differ from wild-type filament assemblies. There is an increase in highly mobile regions for the C294S mutant relative to the wild-type. Our results provide a foundation for developing solid state NMR approaches to characterize intermediate filament assembly mechanisms and the interfering effect of disease mutations.
Weinert, T.; Standfuss, J.; Seidel, H. P.
Show abstract
Macromolecular crystallographic refinement underpins structural biology, yet existing software packages often lack accessible, modular codebases amenable to rapid method development. Here, we introduce TorchRef, a PyTorch-based crystallographic refinement framework that exposes all refinable parameters, atomic coordinates, displacement parameters, occupancies, and scale factors to automatic differentiation. The framework implements FFT-based structure-factor calculations, the French-Wilson treatment of intensities, bulk-solvent modeling with established mask parameters, and stereochemical restraints from the CCP4 Monomer Library. A modular target architecture allows loss functions to be combined, weighted, and extended independently of the core refinement machinery. Validation against 1,000 PDB structures demonstrates that TorchRef-based refinement reproduces a median R-free within 1% of Phenix while maintaining comparable model quality. Structure factor calculation in TorchRef scales readily across multiple CPU cores and is over 100 times faster on modern GPUs than CCTBX. To showcase how modern methods like time-resolved crystallography can benefit from the flexibility that TorchRef provides, we implemented direct refinement of a typical time-resolved model against amplitude differences, a use case currently not explored by classic refinement programs. TorchRef is released under the MIT license with full API documentation and tutorials, providing an accessible platform for developing and testing new crystallographic refinement protocols. SynopsisTorchRef is an open-source PyTorch-based crystallographic refinement framework that exposes all refinable parameters to automatic differentiation, delivers GPU-accelerated structure-factor evaluation more than 100x faster than CCTBX, and enables new workflows, such as direct refinement against amplitude differences in time-resolved crystallography.
Kim, A.-R.; Perrimon, N.
Show abstract
As protein structure prediction tools become widely adopted across biology, there is a growing need for accessible methods to assess and visualize predicted protein-protein interactions (PPIs). Here we present LIVIA (Local Interaction Visualization and Analysis), a browser-based tool that computes local PPI confidence metrics across multiple prediction platforms, identifies predicted interface residues, embeds an interactive Mol* 3D viewer, and generates visualization scripts for ChimeraX and PyMOL. The tool automatically detects prediction formats; all parsing and computation occur locally on the users machine. LIVIA is freely available at https://flyark.github.io/LIVIA.
Fieux-Castagnet, A.; Waton, J.; Glukhonemykh, A.; Snow, E.; Ashokkumar, R.; Fleming, J.; Champagne, D.; Devenyns, T.; Peluffo, A.; Anagnostopoulos, C.
Show abstract
Protein structure prediction models (such as AlphaFold, Chai, Boltz) have transformed structural biology and are increasingly explored for drug discovery; however, their utility for large-scale screening of antibody-antigen (AB-AG) interactions remains unclear, particularly for distinguishing true binding from non-binding pairs at scale. To our knowledge, there has not been an exhaustive exploration of Boltz-2 inference settings on this high impact problem, and in this paper we set out to describe and implement a novel benchmarking framework that can accelerate progress in the field. We evaluated Boltz-2 (NVIDIA NIM implementation) on 519 therapeutic monoclonal antibodies from Thera-SAbDab, pairing each antibody with its cognate target and a randomly assigned non-cognate antigen. We developed a novel evaluation framework that systematically captures variability across stochastic seeds while benchmarking different inference settings, including datasets with and without crystallographically resolved antibody structures. Across settings, Boltz-2-derived confidence metrics showed weak, though above-chance, discrimination (0.5 < ROC-AUC < 0.60). Among evaluated metrics, the minimum value of the interface predicted TM-score (ipTM-min) across seed-samples, captured the strongest signal. Interestingly, additional feature aggregation and multivariate modelling provided little to no improvement. Increasing the number of stochastic predictions yielded front-loaded gains, with diminishing returns beyond [~]15-20 seed-samples, suggesting limited value of extensive sampling in practical workflows. Notably, inference without multiple sequence alignments (MSAs) slightly improved performance on non-crystallized antibodies ({Delta}AUROC {approx} +0.027) while reducing runtime by [~]8 seconds per prediction compared to shallow MSA settings. Overall, these results indicate that off-the-shelf confidence metrics from general-purpose structure prediction models may be insufficient for reliable target-antibody screening and highlight the need for task-specific optimization, while confirming that modest amounts of sampling can be helpful, but not in itself sufficient to improve performance significantly as gains plateau relatively quickly.
Bellaiche, A.; Choudhary, P.; Nair, S.; Harrus, D.; Yu, C. W.-H.; Tanweer, S. A.; Evans, G. L.; Lo, S. W.; Martin, M.; Fleming, J. R.; Velankar, S.
Show abstract
Structure Integration with Function, Taxonomy and Sequences (SIFTS) provides residue-level mappings between UniProt Knowledgebase sequences and Protein Data Bank structures and has historically been generated through internal Protein Data Bank in Europe (PDBe) pipelines. Here, PDBe-SIFTS is presented as a fully open-source, locally deployable implementation of this mapping framework. The pipeline combines fast, scalable sequence search using MMseqs2, an improved bounded scoring scheme for ranking candidate mappings, and residue-level mapping refinement based on backbone connectivity. PDBe-SIFTS is distributed as a Python package with command-line tools for 1) building a sequence search database, 2) identifying the best sequence-structure match, 3) one-to-one mapping at the residue level, and 4) generating SIFTS annotations in PDBx/mmCIF format. Benchmarking on the complete Protein Data Bank archive showed that MMseqs2 reduced archive-scale UniProtKB searches from hours with BLASTP to minutes, approximately 22-36 times faster, while curated mappings were recovered at top rank in 93.1% of cases. The remaining discrepancies mainly involved biologically ambiguous cases such as highly conserved proteins, chimeric constructs, or closely related orthologs. These results show that PDBe-SIFTS enables fast mapping, improving structural coherence in residue-level alignments while delivering the most up-to-date and accurate mappings, comparable to expert curation. Tool: https://github.com/PDBeurope/SIFTS Quick start notebook with example: https://github.com/PDBeurope/SIFTS/tree/master/notebooks Broader audience statementMatching protein sequences to their three-dimensional structures, and mapping annotations across both, is essential for understanding protein function, interactions, and molecular mechanisms. This integrated view enables richer interpretation of biological data and underpins advances in drug discovery, disease research, and protein engineering. PDBe-SIFTS provides an open and functional framework for structure-sequence mapping, allowing researchers and databases to run, inspect, and extend these mappings locally, while benefiting from faster searches, transparent scoring, and structurally informed residue-level alignments. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=110 SRC="FIGDIR/small/721839v1_ufig1.gif" ALT="Figure 1"> View larger version (25K): org.highwire.dtl.DTLVardef@5e6ea6org.highwire.dtl.DTLVardef@1b2754dorg.highwire.dtl.DTLVardef@1334f9forg.highwire.dtl.DTLVardef@1b083a1_HPS_FORMAT_FIGEXP M_FIG C_FIG
Lin, Y.; Lee, M.; Vermani, A.; Jiang, E.; De Cooman, S.; Spetko, M.; AlQuraishi, M.
Show abstract
Despite the breakneck pace of progress in protein design methodology, frontier problems remain challenging, with leading methods struggling to design high-affinity binders, scaffold multiple functional motifs, or stabilize large multi-domain proteins. Recent research efforts have focused on two areas: improving model reasoning when generating active sites or binding interfaces, and improving concordance between the design process and the in silico oracle used to select promising designs. In addressing the first, the field has shifted towards all-atom models that capture sidechain conformations in atomistic detail by eschewing data-efficient SE(3)-equivariance, mirroring the evolution of AlphaFold2 to AlphaFold3. In addressing the second, recent work has focused on replacing generative models employing diffusion or flow-matching with hallucination approaches that directly optimize the oracle in sequence space; this improves success rates but reduces computational efficiency. Here, we close and surpass the generation-hallucination gap by revisiting SE(3)-equivariance using a branched polymer treatment of protein structures. The resulting diffusion model, Genie 3, achieves state-of-the-art performance on binder design, motif scaffolding, and unconditional generation, while being significantly faster than the best existing methods. We use Genie 3 to design a nanomolar binder of Nipah Glycoprotein G, a tetramer with minimal structural or biophysical characterization, as part of the Adaptyv Bio Nipah Competition, achieving a 12.5% success rate. Taken together, our results present a new frontier in protein design capability and a reexamination of the role of SE(3)-equivariance in molecular modeling.
Louet, A. A. B.; Hummer, G.; Vendruscolo, M.
Show abstract
Ligand binding to intrinsically disordered proteins resists description in terms of conventional binding pockets, yet it can be analysed as a dynamic process in which ligands move across transient surface interaction sites. Here we characterise a pathway-based representation in which ligand binding is described as a sequence of transitions between residue-defined microstates, enabling ligand-specific effects to be distinguished from intrinsic properties of the peptide conformational ensemble. Using all-atom molecular dynamics simulations of A{beta}42 and the C-terminal region of -synuclein in complex with chemically diverse small molecules, we construct transition matrices that encode ligand movement across the peptide surface and use Markov state models to identify dominant binding pathways and relative binding propensities. Pairwise enrichment-factor and AUC analyses reveal strong conservation of the highest-ranked pathways across chemically diverse ligands, with enrichment factors of 15-45 for the top-ranked states and AUC values typically [≥]0.75, well above random expectation. These dominant pathways are also preserved across changes in pH and temperature, whereas a urea control, included as a non-specific binder, shows reduced enrichment, indicating that ligands primarily modulate pathway weights rather than define the underlying network topology. Ensemble docking across chemically diverse libraries further supports the presence of recurrent ligand-accessible hotspots within the peptide conformational ensemble. Building on this framework, we apply a prospective screening pipeline to A{beta}42, combining MSM-derived hotspots with sequence-based Ligand-Transformer scoring and Gnina docking across 1.66 million compounds, to nominate 19 candidates for prospective experimental evaluation. Together, these results indicate that disordered protein sequences give rise to conformational ensembles that exhibit characteristic binding pathways for small molecules.
Porras, S. A.; Davis, S. J.; Paredes Trujillo, O. D.; Diep, P.; Schenk, G.; Boden, M.
Show abstract
Building diverse and informative protein sequence datasets is critical for understanding how function varies across sequence space. Because only a small fraction of sequences in a dataset can typically be experimentally characterised, strategies for selecting what sequences to characterise should maximise the information gained from each experiment. Here, we present TreeGazer, a phylogeny-informed framework that combines Bayesian optimisation with the topology of a tree to guide sequence selection. TreeGazer balances exploitation of sequences predicted to exhibit favourable properties against exploration of regions higher model uncertainty. Unlike existing approaches that apply Bayesian optimisation for sequence selection, TreeGazer does not rely on black-box models and instead uses latent representations of property distributions that are directly tied to phylogenetic structure. Modelling properties in this way enables biologically interpretable predictions and uncertainty estimates. Across two simulated selection campaigns, TreeGazer consistently selected sequences that produced datasets more representative of the underlying property distribution than alternative strategies that used protein language models. TreeGazer also performed effectively in low-data settings, where tree-guided selection enabled accurate identification of functional transitions across clades. TreeGazer can be run on conventional laptop computers while still providing equivalent or superior performance to embedding-based approaches. These results demonstrate that phylogenetic structure is a powerful and underutilised prior for guiding informative sequence selection.
Freye, C.; Miller, B. G.
Show abstract
Multi-functionality in extant enzymes, including the ability to transform multiple substrates, is thought to arise, in part, from conformational flexibility. The hexokinase protein family represents a classic model system for investigating the relationship between substrate specificity and conformational change. Within this family, human glucokinase (hGCK) displays notable degrees of conformational heterogeneity, including an intrinsically disordered loop. The extent to which these structural features contribute to the breadth of hGCKs substrate scope is unknown. Here, we investigate the substrate specificities of extant and ancestral glucokinases that span the evolutionary emergence of conformational heterogeneity in this family. We show that extant hGCK catalyzes the ATP-dependent phosphorylation of glucose, 2-deoxyglucose, mannose, glucosamine, fructose, allose and galactose with catalytic efficiencies ranging from 6.3 x 103 M-1 sec-1 to 0.33 M-1sec-1. A glucokinase ancestor from early vertebrate evolution (vGCK), which also displays conformational heterogeneity and disorder, phosphorylates these same seven substrates with similar kcat/Km values. An antecedent, chordate glucokinase (cGCK), which displays reduced conformational heterogeneity and lacks intrinsic disorder, also transforms these same substrates, but with higher overall catalytic efficiencies and markedly lower Km values. Notably, however, the ratios of kcat/Km values for individual substrate pairs, which define specificity, are unchanged for all three enzymes. Our results demonstrate that substrate specificity is not correlated with conformational diversity in GCKs and support a model in which the differences in catalytic efficiencies of various substrates arise from differences in the ability to form the ground state enzyme-carbohydrate binary complex.
Thelen, J.; Koenig, M.; Vuorte, M.; Liimatainen, J.; Javanainen, M.; Lolicato, F.
Show abstract
The plasma membrane is a laterally heterogeneous environment in which lipid organization plays a central role in regulating protein function. In model systems, this heterogeneity is often described in terms of coexisting liquid-ordered (Lo) and liquid-disordered (Ld) phases, commonly associated with the lipid raft concept. Despite extensive experimental and computational efforts, the molecular determinants governing protein partitioning between these domains remain poorly understood, largely due to the limited number of systems studied. Here, we address this challenge using a high-throughput computational approach, systematically analyzing the partitioning behavior of almost 5,000 helical transmembrane peptides in phase-separating lipid membranes. Across all simulations, we find that none of the peptides exhibit a clear preference for the Lo phase, while the vast majority partition into the Ld phase. This observation is consistent with experimental results in simplified membrane systems and suggests that commonly used ternary lipid mixtures may not fully capture the physicochemical environment governing protein sorting in biological membranes. In addition, we identify a subset of peptides that preferentially localize at the Lo/Ld interface. These interfacial peptides display distinct sequence characteristics, indicating that boundary localization is governed by specific combinations of residue composition and spatial arrangement rather than a single dominant feature. Overall, our results reveal that transmembrane helix partitioning in model membranes is dominated by a preference for disordered environments, with interfacial localization emerging as a distinct and potentially functional behavior.
Mead, E. H.; Batz, K. C.; Shih, K.-H.; Fleming, I. R.; Tesdahl, C. D.; Lizardos, L.; Armendariz, J. R.; Hannan, J. P.; Hickey, A. M.; Leyk, A.; Erbse, A. H.; Falke, J. J.
Show abstract
The three conventional isoforms of the Ras G-protein (H-, K-, N-Ras) function as molecular on-off switches that regulate a wide array of signaling pathways, including the Ras-PI3K-PIP3-PDK1-AKT pathway that is central to innate immunity and normal cell growth, and is dysregulated in many disease states. Activation of the pathway by Ras requires adequate Ras-PI3K binding affinity. Here we focus on the interface of known structure in the H-Ras:PI3K{gamma} co-complex essential to multiple pathways including directed pseudopod growth in leukocyte chemotaxis. At this interface 10 H-Ras residues, all 100% conserved between the H-, K- and N-Ras isomers, contact the Ras binding domain of PI3K{gamma} (PI3K{gamma}RBD). To investigate the degree to which the native H-Ras:PI3K{gamma}RBD interface is optimized by evolution for maximal binding affinity, 8 interfacial Ras mutations selected from the COSMIC database and the literature were introduced at the contact positions. All 8 Ras mutations were observed to alter the H-Ras:PI3K{gamma}RBD binding affinity, with 4 mutations yielding significant affinity increases and 4 yielding significant affinity decreases. These findings indicate that the native H-Ras:PI3K{gamma}RBD interface provides intermediate, rather than maximal, binding affinity. Such intermediate affinity is consistent with the substantial binding plasticity of the conserved H-, N-, K-Ras effector docking surface, which has evolved to bind a diverse array of effectors. Furthermore, the findings provide evidence that COSMIC-linked mutations at the H-Ras:PI3K{gamma}RBD interface frequently generate affinity increases as well as decreases, with potential implications for molecular mechanisms of disease and for tool development in cell biology.
Senoner, T.; Vahidi, P.; Olenyi, T.; Senoner, F.; Sisman, G.; Kahl, E.; Rost, B.; Koludarov, I.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWProtein Language Models (pLMs) generate per-protein embeddings that encode functional, structural, and evolutionary information, yet the relationships captured in these representations remain difficult to explore systematically. ProtSpace (https://protspace.app) is a web application for interactive visualization of pLM embedding spaces, enabling hypothesis generation directly in the browser without installation. Unlike traditional network-based tools that exclusively visualize amino acid sequence similarity, ProtSpace explores embedding spaces, revealing relationships often not captured by traditional comparisons. Users provide protein sequences or pre-computed embeddings through a Google Colab notebook or the Python CLI; the pipeline applies dimensionality reduction, retrieves 38 annotation types spanning UniProt, InterPro, NCBI Taxonomy, TED structural domains, and sequence-based predictors served via Biocentral, and produces a portable binary file for the browser-based viewer. WebGL-accelerated rendering supports interactive exploration of over 570,000 proteins. Distinctive features include per-point pie charts for multi-label annotations and integrated 3D structure viewing through AlphaFold2 predictions. All computation happens on the users machine, ensuring data privacy. We demonstrate the utility of ProtSpace through a progressive zoom-in across biological scales: from global proteome organization of Swiss-Prot, through cross-species comparison revealing conserved and lineage-specific families, to functional hypothesis generation within the beta-lactamase superfamily. ProtSpace is freely available at https://protspace.app under the Apache 2.0 license. KO_SCPLOWEYC_SCPLOWO_SCPCAP C_SCPCAPO_SCPLOWPOINTSC_SCPLOWO_LIProtSpace is a free, open-source web application that visualizes protein Language Model (pLM) embeddings as interactive maps, scaling to 570,000 proteins entirely client-side. C_LIO_LIA zero-installation Google Colab notebook and a Python CLI prepare visualization-ready bundles from FASTA files, UniProt queries, or pre-computed HDF5 embeddings, automatically retrieving 38 annotation types from five sources (UniProt, InterPro, NCBI Taxonomy, TED structural domains, and Biocentral sequence predictors) alongside custom CSV metadata. C_LIO_LIApplication examples demonstrate that embedding visualizations generate testable biological hypotheses at multiple scales, from proteome-wide organization through species-level comparison to family-level functional discovery, and that these are complementary to traditional sequence-based analyses. C_LI
Joachimiak, A.; Tan, K.; O'Connor, K. A.; Zhou, X.; Gade, P.; Garcia, E.; Tan, A.; Nijhawan, A.; Endres, M.; Kim, Y.; Greenwood-Quaintance, K.; Patel, R.
Show abstract
Serine-aspartate repeat-containing protein D (SdrD) is a Staphylococcus aureus cell wall-anchored, calcium-binding adhesin member of the MSCRAMM Sdr subfamily that may contribute to bacterial adhesion and virulence. S. aureus is the most common cause of periprosthetic joint infection (PJI). Population-level distribution and sequence diversity of SdrD among clinical PJI isolates have not been systematically characterized, and the SdrD binding mechanism is still not well understood. To address these gaps, sdrD alleles were queried across 156 newly sequenced PJI isolates and compared to publicly available S. aureus genomes, and nucleotide- and protein-level phylogenies of the sdrCDE locus constructed. The SdrD crystal structure from S. aureus JH1 was determined, with solution small-angle X-ray scattering (SAXS) and molecular dynamics (MD) simulations, and assessment of conformational changes with calcium depletion. Three dominant sdrD subtypes were defined, associating with USA300, JH1, and TCH60; the JH1 sdrD subtype was predominant among PJI isolates. Structural studies showed that the conformation of individual domains and interdomain organization of the multidomain SdrD have limited flexibility in solution, and that the calcium-binding B domain retains its core fold under conditions of calcium depletion. Together, the findings presented support functional diversification among Sdr family members in mediating host attachment and inform a re-evaluation of the ligand-binding mechanism previously proposed for SdrD. AUTHOR SUMMARYStaphylococcus aureus is the leading cause of infections that develop around joint implants (periprosthetic joint infection, PJI). This bacterium has a large arsenal of surface proteins that allow it to stick to human tissues and implanted devices. This work focused on one such protein, SdrD, which has been linked to implant-associated infections but the structure and diversity of which among patients with PJI had not been well characterized. The genetic sequences of SdrD were analyzed across thousands of bacterial genomes, including those from patients with PJI. Distinct genetic variants of the protein were found, one of which was particularly common with PJI. The three-dimensional structure of SdrD was determined at atomic resolution and solution small-angle X-ray scattering (SAXS) and molecular dynamics used to study how it moves and responds to changes in its environment. Contrary to what was previously described, SdrD was shown to be relatively rigid. These findings change how SdrDs mechanism of action should be considered, potentially informing design strategies to block bacterial attachment before infection takes hold.
Talpir, I.; Fleishman, S. J.
Show abstract
Computational protein design demands generally applicable models that reliably predict or generate unmeasured variants with superior functional properties. Although protein language models (pLMs) have been used in zero-shot and transfer-learning design studies, they have generally not been assessed in benchmarks that explicitly test combinatorial extrapolation from lower- to higher-order variants. Here we benchmark widely used pLMs against conventional baseline methods in recently described dense, experimentally validated multi-mutant landscapes. We find that regardless of architecture and parameter count, pLMs are statistically similar to one another, and none consistently outperforms conventional baseline methods. Furthermore, their ability to distinguish functional from non-functional variants in zero-shot prediction is comparable to that of conventional homology-based methods. We suggest that to contribute significantly to the design of protein function, pLMs may need to encode biophysical and structural priors or be combined with structure-based approaches.